14 research outputs found

    A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds

    Full text link
    Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for lower price. Despite the monetary advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any moment. Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we propose in this paper a static scheduling for HPC applications which are composed by independent tasks (bag-of-task) with deadline constraints. However, if a spot VM hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. To this end, our algorithm statically creates two scheduling maps: (i) the first one contains, for each task, its starting time and on which VM (i.e., an available spot or on-demand VM with the current lowest price) the task should execute; (ii) the second one contains, for each task allocated on a VM spot in the first map, its starting time and on which on-demand VM it should be executed to meet the application deadline in order to avoid temporal failures. The latter will be used whenever the hibernation period of a spot VM exceeds a time limit. Performance results from simulation with task execution traces, configuration of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of our scheduling and that it tolerates temporal failures

    ImplĂ©mentation et test d’un ordonnanceur Weighted Fair Queuing pour des requĂȘtes d’E/S

    Get PDF
    This report describes the work conducted by Alessa Mayer during a two-month internship in the Inria center of the University of Bordeaux, as a member of the Tadaam team. During her internship, her advisors were Luan Teylo and Francieli Boito. The goal of the internship was to implement and test the weighted fair queuing scheduling algorithm applied to I/O requests, and to integrate it into the AGIOS I/O scheduling library. That scheduler will be used to implement the I/O Sets method, proposed by members of the team in a recent paper.Ce rapport porte sur le travail menĂ© par Alessa Mayer lors d’un stage de deux mois au centre Inria de l’universitĂ© de Bordeaux, au sein de l’équipe Tadaam. Son stage a Ă©tĂ© encadrĂ© par Luan Teylo et Francieli Boito. L’objectif du stage Ă©tait d’implĂ©menter et de tester l’algorithme d’ordonnancement Weighted Fair Queuing (WFQ) appliquĂ© aux requĂȘtes d’E/S, et de l’intĂ©grer dans la bibliothĂšque d’ordonnancement d’E/S AGIOS. Cet ordonnanceur sera utilisĂ© pour implĂ©menter la mĂ©thode I/O Sets, proposĂ©e par les membres de l’équipe dans un article rĂ©cent

    The role of storage target allocation in applications' I/O performance with BeeGFS

    Get PDF
    International audienceParallel file systems are at the core of HPC I/O infrastructures. Those systems minimize the I/O time of applications by separating files into fixed-size chunks and distributing them across multiple storage targets. Therefore, the I/O performance experienced with a PFS is directly linked to the capacity to retrieve these chunks in parallel. In this work, we conduct an in-depth evaluation of the impact of the stripe count (the number of targets used for striping) on the write performance of BeeGFS, one of the most popular parallel file systems today. We consider different network configurations and show the fundamental role played by this parameter, in addition to the number of compute nodes, processes and storage targets. Through a rigorous experimental evaluation, we directly contradict conclusions from related work. Notably, we show that sharing I/O targets does not lead to performance degradation and that applications should use as many storage targets as possible. Our recommendations have the potential to significantly improve the overall write performance of BeeGFS deployments and also provide valuable information for future work on storage target allocation and stripe count tuning

    A dynamic task scheduler tolerant to multiple hibernations in cloud environments

    Get PDF
    International audienceCloud platforms usually offer several types of Virtual Machines (VMs) with different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in the Amazon EC2 cloud, the user pays per use for on-demand VMs while spot VMs are instances available at lower prices. However, a spot VM can be terminated or hibernated by EC2 at any moment. In this work, we propose the Hibernation-Aware Dynamic Scheduler (HADS) that schedules Bag-of-Tasks (BoT) applications with deadline constraints in both hibernation prone spots VMs and on-demand VMs. HADS aims at minimizing the monetary costs of executing BoT applications on Clouds ensuring that their deadlines are respected even in the presence of multiple hibernations. Results collected from experiments on Amazon EC2 VMs using synthetic applications and a NAS benchmark application show the effectiveness of HADS in terms of monetary costs when compared to on-demand VM only solutions

    IO-SETS: Simple and efficient approaches for I/O bandwidth management

    Get PDF
    International audienceOne of the main performance issues faced by high-performance computing platforms is the congestion caused by concurrent I/O from applications. When this happens, the platform’s overall performance and utilization are harmed. From the extensive work in this field, I/O scheduling is the essential solution to this problem. The main drawback of current techniques is the amount of information needed about applications, which compromises their applicability. In this paper, we propose a novel method for I/O management, IO-S ETS. We present its potential through a scheduling heuristic called SET-10, which requires minimum information and can be easily implemented

    I/O performance of multiscale finite element simulations on HPC environments

    Get PDF
    International audienceIn this paper, we present MSLIO, a code to mimic the I/O behavior of multiscale simulations. Such an I/O kernel is useful for HPC research, as it can be executed more easily and more efficiently than the full simulations when researchers are interested in the I/O load only. We validate MSLIO by comparing it to the I/O performance of an actual simulation, and we then use it to test some possible improvements to the output routine of the MHM (Multiscale Hybrid Mixed) library

    A Hibernation Aware Dynamic Scheduler for Cloud Environments

    Get PDF
    International audienceNowadays, cloud platforms usually offer several types of Virtual Machines (VMs) which have different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in the Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for a lower price. Despite the monetary advantages, a spot VM can be terminated or hibernated by EC2 at any moment. In this work, we propose the Hibernation-Aware Dynamic Scheduler (HADS), to schedule applications composed of independent tasks (bag-of-tasks) with deadline constraints in both hibernation-prone spot VMs (for cost sake) and on-demand VMs. We also consider the problem of temporal failures, that occurs when a spot VM hibernates, and does not resume within a time that guarantees the application's deadline. Our dynamic scheduling approach aims at minimizing the monetary costs of bag-of-tasks applications execution, respecting its deadline even in the presence of hibernation. It is also able to avoid temporal failures, by using task migration and work-stealing techniques. Experimental results with real executions using Amazon EC2 VMs confirm the effectiveness of our scheduling when compared with on-demand VM only based approaches, in terms of monetary costs and execution times. It is also shown that our strategy can tolerate temporal failures

    ImplĂ©mentation et test d’un ordonnanceur Weighted Fair Queuing pour des requĂȘtes d’E/S

    Get PDF
    This report describes the work conducted by Alessa Mayer during a two-month internship in the Inria center of the University of Bordeaux, as a member of the Tadaam team. During her internship, her advisors were Luan Teylo and Francieli Boito. The goal of the internship was to implement and test the weighted fair queuing scheduling algorithm applied to I/O requests, and to integrate it into the AGIOS I/O scheduling library. That scheduler will be used to implement the I/O Sets method, proposed by members of the team in a recent paper.Ce rapport porte sur le travail menĂ© par Alessa Mayer lors d’un stage de deux mois au centre Inria de l’universitĂ© de Bordeaux, au sein de l’équipe Tadaam. Son stage a Ă©tĂ© encadrĂ© par Luan Teylo et Francieli Boito. L’objectif du stage Ă©tait d’implĂ©menter et de tester l’algorithme d’ordonnancement Weighted Fair Queuing (WFQ) appliquĂ© aux requĂȘtes d’E/S, et de l’intĂ©grer dans la bibliothĂšque d’ordonnancement d’E/S AGIOS. Cet ordonnanceur sera utilisĂ© pour implĂ©menter la mĂ©thode I/O Sets, proposĂ©e par les membres de l’équipe dans un article rĂ©cent

    A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds

    No full text
    International audienceCloud platforms offer different types of virtual machines which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand instances while spot instances are unused resources available for a lower price. Despite the monetary advantages, a spot instance can be terminated or hibernated by EC2 at any moment. Using both hibernation-prone spot instances (for cost sake) and on-demand instances, we propose in this paper a static scheduling for applications which are composed of independent tasks (bag-of-task) with deadline constraints. However, if a spot instance hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. Performance results with task execution traces, configuration of Amazon EC2 virtual machines, and EC2 market history confirms the effectiveness of our scheduling and that it tolerates temporal failures
    corecore